social determinant
Algorithmic Fairness amid Social Determinants: Reflection, Characterization, and Approach
Tang, Zeyu, London, Alex John, Kasirzadeh, Atoosa, Koyejo, Sanmi, Spirtes, Peter, Zhang, Kun
Social determinants are variables that, while not directly pertaining to any specific individual, capture key aspects of contexts and environments that have direct causal influences on certain attributes of an individual. Previous algorithmic fairness literature has primarily focused on sensitive attributes, often overlooking the role of social determinants. Our paper addresses this gap by introducing formal and quantitative rigor into a space that has been shaped largely by qualitative proposals regarding the use of social determinants. To demonstrate theoretical perspectives and practical applicability, we examine a concrete setting of college admissions, using region as a proxy for social determinants. Our approach leverages a region-based analysis with Gamma distribution parameterization to model how social determinants impact individual outcomes. Despite its simplicity, our method quantitatively recovers findings that resonate with nuanced insights in previous qualitative debates, that are often missed by existing algorithmic fairness approaches. Our findings suggest that mitigation strategies centering solely around sensitive attributes may introduce new structural injustice when addressing existing discrimination. Considering both sensitive attributes and social determinants facilitates a more comprehensive explication of benefits and burdens experienced by individuals from diverse demographic backgrounds as well as contextual environments, which is essential for understanding and achieving fairness effectively and transparently.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (9 more...)
- Law > Civil Rights & Constitutional Law (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Health & Medicine > Therapeutic Area (0.93)
- Education > Educational Setting > Higher Education (0.88)
SynthEHR-Eviction: Enhancing Eviction SDoH Detection with LLM-Augmented Synthetic EHR Data
Yao, Zonghai, Zhao, Youxia, Mitra, Avijit, Levy, David A., Druhl, Emily, Tsai, Jack, Yu, Hong
Eviction is a significant yet understudied social determinants of health (SDoH), linked to housing instability, unemployment, and mental health. While eviction appears in unstructured electronic health records (EHRs), it is rarely coded in structured fields, limiting downstream applications. We introduce SynthEHR-Eviction, a scalable pipeline combining LLMs, human-in-the-loop annotation, and automated prompt optimization (APO) to extract eviction statuses from clinical notes. Using this pipeline, we created the largest public eviction-related SDoH dataset to date, comprising 14 fine-grained categories. Fine-tuned LLMs (e.g., Qwen2.5, LLaMA3) trained on SynthEHR-Eviction achieved Macro-F1 scores of 88.8% (eviction) and 90.3% (other SDoH) on human validated data, outperforming GPT-4o-APO (87.8%, 87.3%), GPT-4o-mini-APO (69.1%, 78.1%), and BioBERT (60.7%, 68.3%), while enabling cost-effective deployment across various model sizes. The pipeline reduces annotation effort by over 80%, accelerates dataset creation, enables scalable eviction detection, and generalizes to other information extraction tasks.
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > Massachusetts > Worcester County > Worcester (0.04)
- North America > United States > Massachusetts > Middlesex County > Lowell (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Improving Social Determinants of Health Documentation in French EHRs Using Large Language Models
Bazoge, Adrien, Beaufils, Pacôme Constant dit, Hmitouch, Mohammed, Bourcier, Romain, Morin, Emmanuel, Dufour, Richard, Daille, Béatrice, Gourraud, Pierre-Antoine, Karakachoff, Matilde
Social determinants of health (SDoH) significantly influence health outcomes, shaping disease progression, treatment adherence, and health disparities. However, their documentation in structured electronic health records (EHRs) is often incomplete or missing. This study presents an approach based on large language models (LLMs) for extracting 13 SDoH categories from French clinical notes. We trained Flan-T5-Large on annotated social history sections from clinical notes at Nantes University Hospital, France. We evaluated the model at two levels: (i) identification of SDoH categories and associated values, and (ii) extraction of detailed SDoH with associated temporal and quantitative information. The model performance was assessed across four datasets, including two that we publicly release as open resources. The model achieved strong performance for identifying well-documented categories such as living condition, marital status, descendants, job, tobacco, and alcohol use (F1 score > 0.80). Performance was lower for categories with limited training data or highly variable expressions, such as employment status, housing, physical activity, income, and education. Our model identified 95.8% of patients with at least one SDoH, compared to 2.8% for ICD-10 codes from structured EHR data. Our error analysis showed that performance limitations were linked to annotation inconsistencies, reliance on English-centric tokenizer, and reduced generalizability due to the model being trained on social history sections only. These results demonstrate the effectiveness of NLP in improving the completeness of real-world SDoH data in a non-English EHR system.
- Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.25)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Singapore (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
The Most Important Features in Generalized Additive Models Might Be Groups of Features
Bosschieter, Tomas M., Franca, Luis, Wolk, Jessica, Wu, Yiyuan, Mehta, Bella, Dehoney, Joseph, Kiss, Orsolya, Baker, Fiona C., Zhao, Qingyu, Caruana, Rich, Pohl, Kilian M.
While analyzing the importance of features has become ubiquitous in interpretable machine learning, the joint signal from a group of related features is sometimes overlooked or inadvertently excluded. Neglecting the joint signal could bypass a critical insight: in many instances, the most significant predictors are not isolated features, but rather the combined effect of groups of features. This can be especially problematic for datasets that contain natural groupings of features, including multimodal datasets. This paper introduces a novel approach to determine the importance of a group of features for Generalized Additive Models (GAMs) that is efficient, requires no model retraining, allows defining groups posthoc, permits overlapping groups, and remains meaningful in high-dimensional settings. Moreover, this definition offers a parallel with explained variation in statistics. We showcase properties of our method on three synthetic experiments that illustrate the behavior of group importance across various data regimes. We then demonstrate the importance of groups of features in identifying depressive symptoms from a multimodal neuroscience dataset, and study the importance of social determinants of health after total hip arthroplasty. These two case studies reveal that analyzing group importance offers a more accurate, holistic view of the medical issues compared to a single-feature analysis.
- North America > United States > Pennsylvania (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > Washington > King County > Redmond (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.88)
Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models
Sakib, Fardin Ahsan, Zhu, Ziwei, Grace, Karen Trister, Yetisgen, Meliha, Uzuner, Ozlem
Social determinants of health (SDOH) extraction from clinical text is critical for downstream healthcare analytics. Although large language models (LLMs) have shown promise, they may rely on superficial cues leading to spurious predictions. Using the MIMIC portion of the SHAC (Social History Annotation Corpus) dataset and focusing on drug status extraction as a case study, we demonstrate that mentions of alcohol or smoking can falsely induce models to predict current/past drug use where none is present, while also uncovering concerning gender disparities in model performance. We further evaluate mitigation strategies - such as prompt engineering and chain-of-thought reasoning - to reduce these false positives, providing insights into enhancing LLM reliability in health domains.
- North America > United States (0.04)
- Asia > Vietnam > Long An Province (0.04)
- Africa > Uganda (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.73)
Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction
Landes, Paul, Sun, Jimeng, Cross, Adam
Social Determinants of Health (SDoH) are economic, social and personal circumstances that affect or influence an individual's health status. SDoHs have shown to be correlated to wellness outcomes, and therefore, are useful to physicians in diagnosing diseases and in decision-making. In this work, we automatically extract SDoHs from clinical text using traditional deep learning and Large Language Models (LLMs) to find the advantages and disadvantages of each on an existing publicly available dataset. Our models outperform a previous reference point on a multilabel SDoH classification by 10 points, and we present a method and model to drastically speed up classification (12X execution time) by eliminating expensive LLM processing. The method we present combines a more nimble and efficient solution that leverages the power of the LLM for precision and traditional deep learning methods for efficiency. We also show highly performant results on a dataset supplemented with synthetic data and several traditional deep learning models that outperform LLMs. Our models and methods offer the next iteration of automatic prediction of SDoHs that impact at-risk patients.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (5 more...)
- Health & Medicine > Consumer Health (0.66)
- Health & Medicine > Therapeutic Area > Endocrinology (0.46)
Early Prediction of Alzheimer's and Related Dementias: A Machine Learning Approach Utilizing Social Determinants of Health Data
Kindo, Bereket, Restar, Arjee, Tran, Anh
Alzheimer's disease and related dementias (AD/ADRD) represent a growing healthcare crisis affecting over 6 million Americans. While genetic factors play a crucial role, emerging research reveals that social determinants of health (SDOH) significantly influence both the risk and progression of cognitive functioning, such as cognitive scores and cognitive decline. This report examines how these social, environmental, and structural factors impact cognitive health trajectories, with a particular focus on Hispanic populations, who face disproportionate risk for AD/ADRD. Using data from the Mexican Health and Aging Study (MHAS) and its cognitive assessment sub study (Mex-Cog), we employed ensemble of regression trees models to predict 4-year and 9-year cognitive scores and cognitive decline based on SDOH. This approach identified key predictive SDOH factors to inform potential multilevel interventions to address cognitive health disparities in this population. Introduction Alzheimer's disease and related dementias (AD/ADRD) pose an escalating medical and public health challenge, currently affecting over 6 million Americans.
- North America > United States > California (0.04)
- North America > United States > Virginia > Fairfax County > McLean (0.04)
- North America > United States > New York > Albany County > Albany (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Dementia (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports
Moukheiber, Lama, Moukheiber, Mira, Moukheiiber, Dana, Ju, Jae-Woo, Lee, Hyung-Chul
We introduce a novel question-answering (QA) dataset using echocardiogram reports sourced from the Medical Information Mart for Intensive Care database. This dataset is specifically designed to enhance QA systems in cardiology, consisting of 771,244 QA pairs addressing a wide array of cardiac abnormalities and their severity. We compare large language models (LLMs), including open-source and biomedical-specific models for zero-shot evaluation, and closed-source models for zero-shot and three-shot evaluation. Our results show that fine-tuning LLMs improves performance across various QA metrics, validating the value of our dataset. Clinicians also qualitatively evaluate the best-performing model to assess the LLM responses for correctness. Further, we conduct fine-grained fairness audits to assess the bias-performance trade-off of LLMs across various social determinants of health. Our objective is to propel the field forward by establishing a benchmark for LLM AI agents aimed at supporting clinicians with cardiac differential diagnoses, thereby reducing the documentation burden that contributes to clinician burnout and enabling healthcare professionals to focus more on patient care.
- North America > United States (0.28)
- Asia > China (0.14)
- Asia > Middle East (0.14)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of Health
Moukheiber, Mira, Moukheiber, Lama, Moukheiber, Dana, Lee, Hyung-Chul
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of Health Mira Moukheiber 1, Lama Moukheiber 1, Dana Moukheiber 1 and Hyung-Chul Lee 2, 1 Massachusetts Institute of Technology 2 Seoul National University College of Medicine, Seoul National University Hospital, Department of Anesthesiology and Pain Medicine vital@snu.ac.kr Abstract In critical care settings, where precise and timely interventions are crucial for health outcomes, evaluating disparities in patient outcomes is important. Current approaches often fall short in comprehensively understanding and evaluating the impact of respiratory support interventions on individuals affected by social determinants of health. Attributes such as gender, race, and age are commonly assessed and essential, but provide only a partial view of the complexities faced by diverse populations. In this study, we focus on two clinically motivated tasks: prolonged mechanical ventilation and successful weaning. We also perform fairness audits on the models' predictions across demographic groups and social determinants of health to better understand the health inequities in respiratory interventions in the intensive care unit. We also release a temporal benchmark dataset, verified by clinical experts, to enable benchmarking of clinical respiratory intervention tasks. 1 Introduction Critically-ill patients often find themselves in the intensive care unit (ICU) seeking specialized support for respiratory distress [ Doyle et al., 1995; Ware and Matthay, 2000 ] . Despite advances in supportive treatments, the in-hospital mortality rate remains 40% for conditions such as acute lung injury and acute respiratory distress syndrome [ Rubenfeld et al., 2005; Sweatt and Levitt, 2014 ] .
- Asia > South Korea > Seoul > Seoul (0.44)
- Asia > Middle East > Israel (0.04)
- South America > Brazil (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Mining Social Determinants of Health for Heart Failure Patient 30-Day Readmission via Large Language Model
Shao, Mingchen, Kang, Youjeong, Hu, Xiao, Kwak, Hyunjung Gloria, Yang, Carl, Lu, Jiaying
Heart Failure (HF) affects millions of Americans and leads to high readmission rates, posing significant healthcare challenges. While Social Determinants of Health (SDOH) such as socioeconomic status and housing stability play critical roles in health outcomes, they are often underrepresented in structured EHRs and hidden in unstructured clinical notes. This study leverages advanced large language models (LLMs) to extract SDOHs from clinical text and uses logistic regression to analyze their association with HF readmissions.